Webclass: Web Document Classiication Using Modiied Decision Trees

نویسنده

  • Wen-Chen Hu
چکیده

Searching for Web sites is one of the most common tasks performed on the Web. Web page classi cation is the rst step for Web search service construction. This paper proposes a system, named WebClass, for classifying Web documents by using a height-three modi ed decision tree which splits the root, depth-one nodes, and depth-two nodes on the keywords, descriptions, and hyperlinks, respectively. Start a URL at the root of the decision tree and trace paths downward to leaves, which give the categories the URL belongs to. A comparison of manual classi cation to WebClass shows the later achieves over 73% accuracy of human classi cation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Language Independent Named Entity Classi cation by modi edTransformation - based Learning and by Decision Tree

We describe our last results at the CoNLL2002 shared task of Named Entity Recognition and Classiication using two approaches that we rst applied to other NLL problems. We have been developing our own modiied TBL learner initially to tackle the Part-of-Speech tagging problem, for integration in a hybrid NLL and rule-based system for information extraction (Ciravegna et al., 1999). After encourag...

متن کامل

Using Causal Knowledge to Learn More Useful Decision Rules From Data

One of the most popular and enduring paradigms in the intersection of machine-learning and computational statistics is the use of recursive-partitioning or \tree-structured" methods to \learn" classiication trees from data sets (Buntine, 1993; Quinlan, 1986). This approach applies to independent variables of all scale types (binary, categorical, ordered categorical, and continuous) and to noisy...

متن کامل

Using Model Trees for Classiication

Model trees, which are a type of decision tree with linear regression functions at the leaves, form the basis of a recent successful technique for predicting continuous numeric values. They can be applied to classiication problems by employing a standard method of transforming a classiication problem into a problem of function approximation. Surprisingly, using this simple transformation the mo...

متن کامل

{24 () Parallel Formulations of Decision-tree Classiication Algorithms

Classiication decision tree algorithms are used extensively for data mining in many domains such as retail target marketing, fraud detection, etc. Highly parallel algorithms for constructing classiication decision trees are desirable for dealing with large data sets in reasonable amount of time. Algorithms for building classiication decision trees have a natural concurrency, but are diicult to ...

متن کامل

Preliminary Investigations into Interactive Classiication in Description Logics

Interactive classiication in description logics is the process of querying a user to obtain information about an individual in a description logic knowledge base. A complex formalization of this process takes into account the expected costs of the entire decision tree to determine the best placement of the individual. A simpler, and easier to compute , formalization uses costs only for determin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999